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DETAILED ACTION 
Response to Amendment 

1. Applicant has submitted an amendment filed 3/21/2005, amending claims 1,11, 
15, 19, 23, 27, and 31-33, while arguing to traverse the art rejection based on amended 
limitations regarding "each of said decoder using speaker model corresponding to a 
different one of said speakers" and "provide speech to a speaker independent speech 
recognition system and a speaker specific speech recognition system substantially 
simultaneous!)/ ' {see claim amendment). Applicant's arguments have been fully 
considered but they are not persuasive. Applicant's invention is drawn to a speech 
recognition system, wherein the input speech is analyzed to identify the speaker and in 
response to the identified speaker retrieving speech models corresponding to that 
speaker for use in a speech recognizer. Referencing to figure 3, a speech recognition 
result from each of a plurality of speech recognizers are compared to each other to 
determine the best result based on the confidence score. However, neither figure 3 nor 
the specification suggest/indicate that each of the plurality of the speech recognizers in 
figure 3 using a speaker model corresponding to a different one of the speakers (or 
commonly referred as speaker-specific speech recognizers). Furthermore, figure 4 
shows two different types of speech recognizers, speaker-independent ASR and 
speaker-dependent ASR. There is no connection between these two types of ASR's 
and a plurality of ASR's shown in figure 3. As thus, the examiner concludes that the 
newly added feature regarding using speaker model corresponding to a different one of 
said speakers is not supported by the specification. The examiner treats the amended 
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claims as having a plurality of speech recognizers connected in parallel recognizing a 
common input speech and compare the recognition result to determine the a single best 
recognition result based on confidence score, and the prior art of record fully anticipate 
the claim limitation. Thus, previous ground of rejection is maintained. 

2. Referring to applicant's argument based on the second amended limitation 
regarding provide speech to a speaker independent speech recognition system and a 
speaker specific speech recognition system substantially simultaneously , applicant's 
arguments have been fully considered but they are not persuasive. Murveit et al. (US 
6671669) fully anticipates all the claim limitations in that the input speech is sent to both 
speaker-dependent ASR (elements 112 and 118 in figure 1) and speaker-independent 
ASR {elements 114, 116, 120, and 122 in figure 1). A decision logic (element 124 in 
figure 1) determining the best recognition result from the three recognition results 
provided at the output of the three speech recognizers. Therefore, previous ground of 
rejection is maintained. 

Claim Rejections - 35 USC §112 

3. Claims 1-14, 19-22, and 31 are rejected under 35 U.S.C. 112, first paragraph, as 
failing to comply with the enablement requirement. The claim(s) contains subject matter 
which was not described in the specification in such a way as to enable one skilled in 
the art to which it pertains, or with which it is most nearly connected, to make and/or use 
the invention. Applicant's invention is drawn to a speech recognition system, wherein 



Application/Control Number: 10/040,406 Page 4 

Art Unit: 2655 

the input speech is analyzed to identify the speaker and in response to the identified 
speaker retrieving speech models corresponding to that speaker for use in a speech 
recognizer. Referencing to figure 3, a speech recognition result from each of a plurality 
of speech recognizers are compared to each other to determine the best result based 
on the confidence score. However, neither figure 3 nor the specification 
suggest/indicate that each of the plurality of the speech recognizers in figure 3 using a 
speaker model corresponding to a different one of the speakers (or commonly referred 
as speaker-specific speech recognizers). Furthermore, figure 4 shows two different 
types of speech recognizers, they are speaker-independent ASR and speaker- 
dependent ASR. There is no connection between these two types of ASR's and a 
plurality of ASR's shown in figure 3. As thus, the examiner concludes that the newly 
added feature regarding using speaker model corresponding to a different one of said 
speakers is not supported by the specification. The examiner treats the amended 
claims as having a plurality of speech recognizers connected in parallel recognizing a 
common input speech and compare the recognition result to determine the a single best 
recognition result based on confidence score, and the prior art of record fully anticipate 
the claim limitation. 

Claim Rejections - 35 USC § 102 

4. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 
A person shall be entitled to a patent unless - 
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(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

5. Claims 1-2, 4, 19-20, 22, and 31 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Garudadri et al. (US Patent No. 6671669). 

6. Regarding claim 1 , Garudadri et al. disclose a method for transcribing speech of 
a plurality of speakers, comprising: providing said speech to a plurality of speech 
decoders, each of said decoders using a speaker model for one of said speakers and 
generating a confidence score for each decoded output (figure 1 or referring to 8, In. 33- 
67); and selecting a decoded output based on said confidence score {the process of 
figures 7-9). 

7. Regarding claim 19, Garudadri et al. disclose a system for transcribing speech of 
a plurality of speakers, comprising: 

a memory that stores computer-readable code (col. 13, In. 78-60); and a 
processor operatively coupled to said memory, said processor configured to implement 
said computer-readable code (col. 13, In. 18-60), said computer-readable code 
configured to: 

provide said speech to a plurality of speech decoders, each of said decoders 
using a speaker model for one of said speakers and generating a confidence score for 
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each decoded output (figure 1 or referring to 8, In. 33-67); and select a decoded output 
having a highest confidence score (the process of figures 7-9). 

8. Regarding claim 31 , Garudadri et al. disclose an article of manufacture for 
transcribing speech of a plurality of speakers, comprising: 

a computer readable medium having computer readable code means embodied 
thereon (col. 13, In. 18-60), said computer readable program code means comprising: 

a step to provide said speech to a plurality of speech decoders, each of said 
decoders using a speaker model for one of said speakers and generating a 
confidence score for each decoded output (figure 1 or referring to 8, In. 33-67); and a 
step to select a decoded output having a highest confidence score (the process of 
figures 7-9). 

9. Regarding claims 2, 4, 20, and 22, Garudadri et al. further disclose the method 
and system of claims 1 and 19, further comprising the step of aligning each of said 
decoded outputs in time (the DTW Matching in elements 118, 120, and 122 in figure 1 is 
the aligning step), and the step of presenting said selected decoded output to a user 
(col. 8, In. 64 to col. 9, In. 2). 

10. Claims 11-18, 23-30, and 32-33 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Murveit et al. (US Patent No. 6671669). 
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1 1 . Regarding claim 1 1 , Murveit et al. disclose a method for transcribing speech of a 
plurality of speakers, comprising: providing said speech to a speaker independent 
speech recognition system and a speaker specific speech recognition system (referring 
to figures 3-5 or col. 4, In. 1 to col. 5, In. 67); and decoding said speech using said 
speaker independent speech recognition system whenever the identity of the current 
speaker is unknown (elements 206, 208, and 210 in figure 3). 

12. Regarding claim 15, Murveit et al. disclose a method for transcribing speech of a 
plurality of speakers, comprising: providing said speech to a speaker independent 
speech recognition system and a speaker specific speech recognition system (referring 
to figures 3-5 or col. 4, In. 1 to col. 5, In. 67); and decoding said speech using said 
speaker specific speech recognition system with a speaker model for an identified 
speaker until there is a speaker change (the operation of figure 3, at first when the 
speaker using the system is speaker specific, acoustic profile of that speaker is used in 
the speech recognition process. However, when a different speaker uses the system 
detected by the ID Speaker 204, the system retrieves profile of said different speaker for 
use in the speech recognition process). 

13. Regarding claim 23, Murveit et al. further disclose a system for transcribing 
speech of a plurality of speakers, comprising: 

a memory that stores computer-readable code (element 104 and 106 in figure 2); 
and a processor operatively coupled to said memory, said processor configured to 
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implement said computer-readable code (Processor 102 in figure 2), said computer- 
readable code configured to: 

provide said speech to a speaker independent speech recognition system and a 
speaker specific speech recognition system (referring to figures 3-5 or col. 4, In. 1 to col. 
5, In. 67); and decode said speech using said speaker independent speech recognition 
system whenever the identity of the current speaker is unknown (elements 206, 208, 
and 210 in figure 3). 

14. Regarding claim 27, Murveit et al. disclose a system for transcribing speech of a 
plurality of speakers, comprising: 

a memory that stores computer-readable code (element 104 and 106 in figure 2); 
and a processor operatively coupled to said memory, said processor configured to 
implement said computer-readable code (Processor 102 in figure 2), said computer- 
readable code configured to: 

provide said speech to a speaker independent speech recognition system and a 
speaker specific speech recognition system (referring to figures 3-5 or col. 4, In. 1 to col. 
5, In. 67); and decode said speech using said speaker specific speech recognition 
system with a speaker model for an identified speaker until there is a speaker change 
(the operation of figure 3, at first when the speaker using the system is speaker specific, 
acoustic profile of that speaker is used in the speech recognition process. However, 
when a different speaker uses the system detected by the ID Speaker 204, the system 
retrieves profile of said different speaker for use in the speech recognition process). 
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15. Regarding claims 12-14, 17-18, 24-26, and 29-30, Murveit et al. further disclose 
the method and system of claims 11, 15, 23, and 27, wherein said decoding step 
continues until a speaker identification system identifies an unknown speaker (the 
operation in figure 3 is a continuous process, speaker changes are continuously 
monitored by the ID Speaker 204), wherein one or more of said speaker independent 
speech recognition system and said speaker specific speech recognition system are on 
a remote server (Speech Recognition System in figure 2), and the step of presenting 
said selected decoded output to a user (col. 7, In. 9-22). 

16. Regarding claims 16 and 28, Murveit et al. further disclose the method and 
system of claims 15 and 27, further comprising the step of decoding said speech using 
a speaker independent speech recognition system until the identity of a speaker is 
determined and the appropriate speaker model is loaded (the operation of figure 3, at 
first when the speaker using the system is speaker specific, acoustic profile of that 
speaker is used in the speech recognition process. However, when a different speaker 
uses the system detected by the ID Speaker 204, the system retrieves profile of said 
different speaker for use in the speech recognition process). 

17. Regarding claim 32, Murveit et al. disclose an article of manufacture for 
transcribing speech of a plurality of speakers, comprising: 
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a computer readable medium having computer readable code means embodied 
thereon (memory system 104 in figure 4), said computer readable program code means 
comprising: 

a step to provide said speech to a speaker independent speech recognition 
system and a speaker specific speech recognition system (referring to figures 3-5 or col. 
4, In. 1 to col. 5, In. 67); and 

a step to decode said speech using said speaker independent speech 
recognition system whenever the identity of the current speaker is unknown (elements 
206, 208, and 210 in figure 3). 

1 8. Regarding claim 33, Murveit et al. disclose an article of manufacture for 
transcribing speech of a plurality of speakers, comprising: 

a computer readable medium having computer readable code means embodied 
thereon (memory system 104 in figure 4), said computer readable program code means 
comprising: 

a step to provide said speech to a speaker independent speech recognition 
system and a speaker specific speech recognition system (referring to figures 3-5 or col. 
4, In. 1 to col. 5, In. 67); and 

a step to decode said speech using said speaker specific speech recognition 
system with a speaker model for an identified speaker until there is a speaker change 
(the operation of figure 3, at first when the speaker using the system is speaker specific, 
acoustic profile of that speaker is used in the speech recognition process. However, 
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when a different speaker uses the system detected by the ID Speaker 204, the system 
retrieves profile of said different speaker for use in the speech recognition process). 

Claim Rejections - 35 USC § 103 

1 9. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

20. Claims 5-7 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Garudadri et al. (US Patent No. 6671669) in view of Baker (US Patent No. 6122613). 

21 . Regarding claims 5-7, Garudadri et al. fail to specifically disclose the method of 
claim 1 , further comprising the step of manually selecting an alternate decoded output if 
said assigned output is incorrect, the step of adapting said selecting step based on said 
manual selection, and the step of presenting several decoded outputs to a user with an 
indication of said corresponding confidence score. 

However, Baker teach the step of manually selecting an alternate decoded 
output if said assigned output is incorrect (col. 8, In. 33 to col. 9, In. 40), the step of 
adapting said selecting step based on said manual selection (col. 9, In. 18-40), and the 
step of presenting several decoded outputs to a user with an indication of said 
corresponding confidence score (col. 8, In. 33 to col. 9, In. 40). 
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Since Garudadri et al. and Baker are analogous art because they are from the 
same field of endeavors, it would have been obvious to one of ordinary skill in the art at 
the time of invention to modify Garudadri et al. by incorporating the teaching of Baker in 
order to enable the user to correct misrecognized words and to train the system to 
enhance subsequent recognitions. 

22. Claims 3, 10, and 21 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Garudadri et al. (US Patent No. 6671669) in view of Murveit et al. (US Patent No. 
6671669). 

23. Regarding claim 10, Garudadri et al. fail to disclose the method of claim 1, 
wherein said selecting step further comprises the step of determining if a decoded 
output includes an isolated word from a second speaker in a string of words from a first 
speaker. However, Murveit et al. teach that the selecting step further comprises the 
step of determining if a decoded output includes an isolated word from a second 
speaker in a string of words from a first speaker (Speaker ID 204 in figure 3 determines 
speaker changes). 

Since Garudadri et al. and Murveit et al. are analgous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Garudadri et al. by incorporating the teaching of 
Murveit et al. in order to specify the system to use speaker-specific profile in recognizing 
speech to enhance speech recognition accuracy. 
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24. Regarding claims 3 and 21 , Garudadri et al. fail to specifically disclose the 
method and system of claims 1 and 19, wherein one or more of said speech decoders 
are on a remote server. However, Murveit et al. teach that one or more of said speech 
decoders are on a remote server (Speech Recognition System 100 in figure 2). 

Since Garudadri et al. and Murveit et al. are analgous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Garudadri et al. by incorporating the teaching of 
Murveit et al. in order to distribute the processing load from the limited-processing- 
capability device. 

25. Claim 8 is rejected under 35 U.S.C. 103(a) as being unpatentable over Garudadri 
et al. (US Patent No. 6671669) in view of Chao Chang et al. (US Patent No. 6567778). 

26. Regarding claim 8, Garudadri et al. fail to specifically disclose the method of 
claim 1 , further comprising the step of presenting said decoded output as a string of 
words if said corresponding confidence score exceeds a certain threshold and as a 
string of phones if said corresponding confidence score is below a certain threshold. 
However, Chao Chang et al. teach the step of presenting said decoded output as a 
string of words if said corresponding confidence score exceeds a certain threshold and 
as a string of phones if said corresponding confidence score is below a certain 
threshold (element 116 in figure 1, if the confidence score is lower than the threshold, 
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the user is queried to re-input the low confidence score words. Thus, the query must 
represent a series of phones). 

Since Garudadri et al. and Chao Chang et al. are analgous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Garudadri et al. by incorporating the teaching of 
Chao Chang et al. in order to allow the user to correct misrecognized words to enhance 
speech recognition in subsequent recognitions. 

Allowable Subject Matter 

27. Claim 9 is objected to as being dependent upon a rejected base claim, but would 
be allowable if rewritten in independent form including all of the limitations of the base 
claim and any intervening claims. The following is a statement of reasons for the 
indication of allowable subject matter: Garudadri et al. disclose a method and system 
that combines speech recognition engines and resolves any differences between the 
results of individual speech recognition engines {referring to figure 1). Murveit et al. 
teach a speech recognition adaptation system that whenever a new speaker is 
encountered, the system uses the speaker independent models to carry out speech 
recognition. At the same time the speaker independent models are used to adapt the 
speech of the new speaker. The adapted speech for the speaker is stored in memory 
for use in subsequent speech recognition of this speaker (referring to figures 3-5). 
However, both Garudadri et al. and Murveit et al. fail to specifically disclose the step of 
"presenting said decoded output as a string of words for the decoded output having the 
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highest confidence score and as phones or syllables for all other decoded outputs." 
Furthermore, it would have not been obvious to one of ordinary skill in the art at the time 
of invention to modify prior art of record to realize the claimed invention. 

Conclusion 

Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen Vo whose telephone number is 571-272-7631. 
The examiner can normally be reached on M-F, 9-5:30. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703-305-4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

HuyenX.Vo May 10, 2005 
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